General instructions for all assignments:
R Markdown file (named as: [AndrewID]-HW09.Rmd – e.g. “sventura-HW09.Rmd”) to the Homework 09 submission section on Blackboard. You do not need to upload the .html file.(4 points)
Organization, Themes, and HTML Output
warning = FALSE and message = FALSE in every code block.ggplot theme and use of color:ggplot() color scheme.color = "black").library(tidyverse)
library(data.table)
library(forcats)
# Simple theme with white background, legend at the bottom
my_theme <- theme_bw() +
theme(axis.text = element_text(size = 12, color = "indianred4"),
text = element_text(size = 14, face = "bold", color = "darkslategrey"))
# Colorblind-friendly color palette
my_colors <- c("#000000", "#56B4E9", "#E69F00", "#F0E442", "#009E73", "#0072B2",
"#D55E00", "#CC7947")
(3 points each)
Sometimes, seemingly rational methods of data visualization may not be the most effective way of presenting data. Even though dividing up a full range of data for a variable into equally sized bins might seem like a good idea, this method can be misleading if there are several extreme outliers or if a small number of bins has a disproportionately large/small number of observations. When considering how to visualize big distinctions in the data, like positive versus negative growth, it is often useful to use a bivariate color distribution to emphasize the distinction compared to using a single color gradient.
The standard Mercator projection map of the world that we use today does not accurately reflect relative sizes of land masses and seas, particularly for Antarctica and Greenland. A new way of visualizing the world map developed by architect and artist Hajime Narukawa known as the AuthaGraph map projection method won the 2016 Good Design Award for its ability to faithfully preserve the relative sizes of 96 different regions of the world. As issues like climate change, melting glaciers in Greenland, and territorial sea claims become more relevant, this new perspective of the world may be helpful in ensuring that all interests of the planet are equally represented.
ggmap uses a two-step process to create interesting and informative graphics in ggplot. First, the get_map() function is used to download and format an image for plotting from a variety of online sources given a latitude/longitude pair and some sort of location information such as a zoom or bounding box. Next, ggmap() plots the downloaded image as a context layer using ggplot2. Since ggmap() returns a ggplot object, the product of ggmap() can act as a base layer for the entire ggplot2 framework.
You can use stat_density2d(fill = ..level..) along with scale_fill_gradient() to add a heatmap layer to a ggmap base.
(2 points each)
library(ggmap)
get_map() creates a ggmap object for plotting by querying a map server given location and zoom arguments. The different map sources that can be used in get_map() are Google Maps, OpenStreetMap, Stamen Maps, and Naver Map.
The zoom parameter determines the bounds of the viewing window for the map we are plotting. It takes in values between 3 (continent) and 21 (building) with a default value of 10 (city). A viewing window of 1 square mile would require a zoom level of approximately 16-18.
We can choose the types “terrain”, “terrain-background”, “satellite”, “roadmap”, and “hybrid”, “watercolor”, and “toner.” The “hybrid” type is unique to Google Maps.
The map shows Carnegie Mellon University, Phipps, and part of Schenley Park pulled from Google Maps. The direct center of the map seems to be Baker Hall.
In get_map(), ‘location’ specifies where the center of the map should be, ‘color’ specifies that the map should be in color, ‘source’ specifies that the map should be pulled from Google Maps, ‘maptype’ specifies that the map should use the “hybrid” map theme, and ‘zoom’ specifies that the viewing window should be roughly a square mile.
In ggmap(), ‘extent’ specifies how large the map should be relative to the plotting device, and ‘xlab’ and ‘ylab’ label the x and y axes.
library(ggmap)
map_base <- get_map(location = c(lon = -79.944248, lat = 40.4415861),
color = "color",
source = "google",
maptype = "hybrid",
zoom = 16)
map_object <- ggmap(map_base,
extent = "device",
ylab = "Latitude",
xlab = "Longitude")
map_object
Using a non-integer input for the zoom parameter returns an error stating that “zoom must be a whole number between 1 and 21”.
# class(map_object)
map_object is a ggplot object, which inherits from class gg.
(2 points each)
The latitude is 40.7588995, and the longitude is -73.9873197.
The latitude and longitude should not change when the zoom is changed. This makes sense since the latitude and longitude specify the center of the viewing window, which does not change as we zoom in or out.
map_base_NYC <- get_map(location = c(lon = -73.9873197, lat = 40.758899),
color = "bw",
source = "google",
maptype = "roadmap",
zoom = 12)
map_object_NYC <- ggmap(map_base_NYC,
extent = "device",
ylab = "Latitude",
xlab = "Longitude")
map_object_NYC
The output of map_object_NYC is a greyscale map, pulled from Google Maps, which shows road and highway information around the Manhattan, Brooklyn, and Jersey City areas.
For both problems below, it may help to look over:
Both files are located under Course Content.
(30 points total)
Mapping US Flights
airports <- read_csv("https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat",
col_names = c("ID", "name", "city", "country", "IATA_FAA",
"ICAO", "lat", "lon", "altitude", "timezone", "DST"))
routes <- read_csv("https://raw.githubusercontent.com/jpatokal/openflights/master/data/routes.dat",
col_names = c("airline", "airlineID", "sourceAirport",
"sourceAirportID", "destinationAirport",
"destinationAirportID", "codeshare", "stops",
"equipment"))
departures <- routes %>%
dplyr::group_by(sourceAirportID) %>%
dplyr::summarize(flights = n()) %>%
mutate(sourceAirportID = as.integer(as.vector(sourceAirportID)))
arrivals <- routes %>%
dplyr::group_by(destinationAirportID) %>%
dplyr::summarize(flights = n()) %>%
mutate(destinationAirportID = as.integer(as.vector(destinationAirportID)))
# Merge each of the arrivals/departures data.frames with the airports data.frame above
airportD <- left_join(airports, departures, by = c("ID" = "sourceAirportID"))
airportA <- left_join(airports, arrivals, by = c("ID" = "destinationAirportID"))
map <- get_map(location = 'United States', zoom = 4)
mapPoints <- ggmap(map) +
geom_point(aes(x = lon, y = lat, size = flights),
data = airportA) +
ggtitle("Location of Airports Sized by Number of Arriving Flights")
# Add a custom legend to the plot
mapPointsLegend <- mapPoints +
scale_size_area(breaks = c(10, 50, 100, 500, 900),
labels =c(10, 50, 100, 500, 900),
name = "Number of Arriving Routes")
mapPointsLegend
my_airport_code <- "LAX"
lax_routes <- dplyr::filter(routes,
sourceAirport == my_airport_code |
destinationAirport == my_airport_code)
lax_airport <- lax_routes %>%
left_join(airports, by = c("sourceAirport" = "IATA_FAA")) %>%
dplyr::select(destinationAirport, lat, lon, timezone) %>%
dplyr::rename(source_lat = lat, source_lon = lon, source_timezone = timezone) %>%
left_join(airports, by = c("destinationAirport" = "IATA_FAA")) %>%
dplyr::select(source_lat, source_lon, source_timezone, lat, lon, timezone) %>%
dplyr::rename(dest_lat = lat, dest_lon = lon, dest_timezone = timezone)
mapPointsLegend +
geom_segment(aes(x = source_lon, y = source_lat, xend = dest_lon,
yend = dest_lat), data = lax_airport, alpha=.15) +
labs(x = "Longitude", y = "Latitude",
title = "Flights To and From Los Angeles") +
theme_void()
mapPointsLegend +
geom_curve(aes(x = source_lon, y = source_lat, xend = dest_lon,
yend = dest_lat), data = lax_airport,
arrow = arrow(length = unit(0.02, "npc")), alpha=.15) +
labs(x = "Longitude", y = "Latitude",
title = "Flights To and From Los Angeles") +
coord_cartesian()
lax_airport$change_timezone <- lax_airport$source_timezone - lax_airport$dest_timezone
lax_airport <- lax_airport[which(abs(lax_airport$change_timezone) <= 3), ]
# Filter by routes that wil be shown on map
mapPointsLegend +
geom_curve(aes(x = source_lon, y = source_lat, xend = dest_lon,
yend = dest_lat, color = change_timezone), data = lax_airport,
arrow = arrow(length = unit(0.02, "npc"))) +
labs(x = "Longitude", y = "Latitude",
title = "Flights To and From Los Angeles",
color = "Change in \nTime Zone \nin Hours") +
coord_cartesian() +
scale_color_gradient(low = "blue", high = "red")
(40 points)
Choropleth Maps of Rent Prices
rent <- read_csv("https://raw.githubusercontent.com/sventura/315-code-and-datasets/master/data/price.csv")
## Parsed with column specification:
## cols(
## .default = col_integer(),
## City = col_character(),
## Metro = col_character(),
## County = col_character(),
## State = col_character()
## )
## See spec(...) for full column specifications.
rent_jan2017 <- rent %>%
select(County, State, `January 2017`) %>%
rename(jan_2017 = `January 2017`) %>%
arrange(State) %>%
group_by(State) %>%
summarize(mean_rent = mean(jan_2017))
state_data <- data_frame(state.abb, state.name) %>%
mutate(state.name = tolower(state.name)) %>%
left_join(rent_jan2017, by = c("state.abb" = "State"))
state_borders <- map_data("state") %>%
left_join(state_data, by = c("region" = "state.name"))
ggplot(state_borders, aes(x = long, y = lat, fill = mean_rent)) +
geom_polygon(aes(group = state.abb), color = "black") +
theme_void() +
coord_map("mercator") +
scale_fill_gradient2(high = "darkred", low = "darkblue",
mid = "white", midpoint = 1500) +
labs(title = "Mean Rent per State, Jan. 2017",
fill = "Mean Rent ($)")
The highest average rent cost states seem to be California, Massachusetts, and New Jersey, while the lowest rent cost states include Oklahoma, West Virginia, and Arkansas. Average rent prices seem to be higher around the Pacific coast and in New England, while they tend to be lower in the Deep South, the Midwest, and the Mountain time zone.
rent_jan2017 <- rent %>%
select(County, State, `January 2017`) %>%
rename(jan_2017 = `January 2017`) %>%
arrange(County) %>%
group_by(County) %>%
summarize(mean_rent = mean(jan_2017)) %>%
mutate(County = tolower(County))
county_borders <- map_data("county") %>%
left_join(rent_jan2017, by = c("subregion" = "County"))
ggplot(county_borders, aes(x = long, y = lat, fill = mean_rent)) +
geom_polygon(aes(group = group)) +
theme(plot.title = element_text(hjust = 0.5)) +
coord_map("mercator") +
scale_fill_gradient2(high = "darkred", low = "darkblue",
mid = "white", midpoint = 2000) +
labs(title = "Mean Rent per County, Jan. 2017",
fill = "Mean Rent ($)") +
theme_bw()
THe higher end of the county average monthy rents seem to predominantly be on the coasts. The above average monthly rents are concentrated in a few counties in Californa, New York, New Jersey, and Florida. There also seem to be a few high-rent counties states in the Midwest. These are significantly higher than most of the country; the rest of the counties seems to have average monthly rent below 2000 dollars per month, while these counties seem to be 3000 dollars per month and above. There seems to be a lot of missing data from the Great Plains states.
See the BonusProblems assignment on Blackboard.